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RCI  Overview  Agenda 
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•  Topic  Follow-on  Recommendations 

•  Team  Membership  and  Recognition 
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Motivation  for  RCI  Guide 


Some  root  cause  failure  investigations  failed  to  identify  true  root  cause 

-  Unfortunately ;  failure  recurred  after  corrective  actions  were  implemented  for 
what  was  believed  to  be  the  root  cause 

Projects  and  teams  may  lack  leadership  and  guidance  documents  on 
performance  of  root  cause  analysis  (RCA) 

Variability  in  RCA  techniques  used  can  result  in  ineffective  or  inefficient 
root  cause  investigations 

For  the  National  Security  Space  community,  no  recognized  RCI  best 
practice  exists 
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Summary 


This  guide  has  been  prepared  to  help  determine  what  methods  and 
software  tools  are  available  when  significant  detailed  root  cause 
investigations  are  needed  and  what  level  of  rigor  is  appropriate  to 
reduce  the  likelihood  of  missing  true  root  causes  identification.  For  this 
report  a  root  cause  is  the  ultimate  cause  or  causes  that  if  eliminated 
would  have  prevented  the  occurrence  of  the  failure.  In  reality,  many 
failures  require  only  one  or  two  investigators  to  identify  root  causes  and 
do  not  demand  an  investigation  plan  that  includes  many  of  the 
practices  defined  in  this  document. 

During  ground  testing  and  on-orbit  operations  of  space  systems, 
programs  have  experienced  anomalies  and  failures  where 
investigations  did  not  truly  establish  definitive  root  causes.  This  has 
resulted  in  unidentified  residual  risk  for  future  missions 
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Notional  Failure  Recurrence  Example 


Payload  Fairing 

Payload  Fairing 

Failed  to  Separate 

Failed  to  Separate 

f  — l 


J_  I 

4  possible  root  causes  identified  2  likely  intermediate  causes  identified 

and  corrective  action  implemented  Unable  to  identify  root  cause 

Several  technical  recommendations 
made 


Failure  to  Identify  True  Root  Cause  Increases  Mission  Risk 

Goal  of  RCI  Best  Practices  Guide  is  to  Improve  Identification 
of  True  Root  Causes  and  Minimize  Mission  Risk 


US *  Space  Program 
Mission  Assurance 
Improvement 
Workshop 
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RCI  Topic  Team  Charter 


•  Establish  a  cross  industry  and  government  team  to  formulate 
foundational  information  and  recommended  best  practices  for  Root 
Cause  Investigations  (RCI)  focused  on  the  space  industry 
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Root  Cause  Investigation  Purpose  and  Scope 


During  ground  testing  and  on-orbit  operations  of  space  systems, 
programs  have  experienced  anomalies  and  failures  where 
investigations  did  not  truly  establish  definitive  root  causes.  This  has 
resulted  in  unidentified  residual  risk  for  future  missions 

Guide  focuses  on  RCA  elements  of  the  broader  Root  Cause 
Corrective  Action  (RCCA)  process  per  request  of  the  Steering 
Committee.  Corrective  action  process  is  not  discussed  in  this  guide. 
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Root  Cause  Investigation  Guide 
Traceability  to  Steering  Committee 


Deliverable  Requested 

Location  in  RCI  Guide 

Overview  of  basis  for  RCAs,  definitions  and 
terminology,  commonly  used  techniques,  needed 
skills/experience 

Sec  2,  Sec  3,  Sec  8,  Sec  5.3 

Key  early  actions  to  take  following  anomaly/failure 

Sec  5.0 

Data/information  collection  approaches 

Sec  6.0 

Structured  RCA  approaches  -  pros/cons 

Sec  8,  Table  11 

Survey/review  of  available  RCA  tools 

Sec  9.0 

Guidance  on  criteria  for  determining  when  a  RCA  is 
sufficient  (when  do  you  stop) 

Sec  10.0 

RCA  of  on  orbit  vs.  on  ground  anomalies 

Sec  11.0 

Handling  of  root  cause  unknown  and  unverified 
failures 

Sec  12.0 
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Root  Cause  Investigation  Best  Practices  Guide 


Key  Early  Actions  following  the  failure/anomaly  are  included  in  the  top 
chevron  bar,  and  the  balance  of  the  RCCA  process  is  included  in  the  bottom 
chevron  bar.  Note  that  this  document  only  addresses  those  actions  that 
significantly  affect  the  Root  Cause  Analysis  step  (noted  in  bold  blue): 


3. 

Communication 

& 

Documentation 


5.  FRB  &  Other 
Process  I  nitiation 


9. 

Corrective 

Action 

Plan 


10. 

I  mplement 
&  Verify 


11.  Validate 
Actions 
Effective 


Return  to 
Step  #6  if 
Problem 
Recurs 


•  Focus  is  on  Root  Cause  Analysis  Methods  and  Tools  utilized  by  the  RCI 
Space  Systems  Team  and  companies 

-  No  magic  methods  or  tools;  identified  common  issues  and  facilitation  techniques 

-  Many  ground  failure  RCI’s  validate  true  root  cause  quickly  (hardware  available) 

-  A  few  complex  anomalies  benefit  from  a  combination  of  items  in  the  RCI  guide 


UtS*  Space  Program 
Mission  Assurance 

■  * 

Improvement 

Workshop 
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RCI  Guide  Details 


•  This  Guidance  Document  summarizes  root  cause  investigation  best 
practice  recommendations  and  key  takeaways  for  use  with  simple  or 
complex  on-ground  or  in-orbit  failures  and  anomalies. 

•  The  industry  Core  Team  focused  on  combining  most  effective  root 
cause  investigation  approaches  from  each  company  in  a  usable 
format. 

•  Provide  Pro  and  Cons  for  RCA  Methods  and  RCA  Pitfalls. 
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RCI  Guide  Details-  Cont 


•  Preservation  of  the  “scene”  of  the  failure:  Don’t  contaminate 
evidence  (by  immediately  reworking  affected  unit  to  restore  operation  if 
it  would  affect  failure  investigation) 

•  Immediate  Data  Collection:  Interviews,  observations,  measurements, 
audio/video,  chart  recordings,  all  relevant  data,  etc. 

•  Determine  Team  Composition  (as  appropriate):  Ideally  include 
6-8  people  including  Process  Performers  (operators,  technicians,  etc.), 
Subject  Matter  Experts  (engineers,  scientists,  etc.),  Customer  or  Rep 
(Quality,  Mission  Assurance,  etc.),  RCCA  Facilitator,  Team 
Leader/Chair;  define  roles  (RACI  -  Responsible,  Accountable, 
Consulted,  Informed  and/or  RAA-  Responsibility,  Accountability, 
Authority) 
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RCI  Guide  Details  -  Cont 


•  Problem  Definition  (“Problem  Definition  Worksheet”):  Define, 
understand  and  agree  upon  the  facts  surrounding  the  anomaly:  Title, 
Customer,  What  happened  (be  specific)?,  When  did  it  happen  (start 
date/time  -  stop  date/time)?,  Where  did  it  happen  (be  specific)?,  How 
often  did  it  happen?,  Was  it  repeatable  under  specific  conditions?, 
Importance/Significance?,  Avoid  “Who”  (may  inhibit  cooperation  if 
people  think  they  are  being  blamed),  Avoid  “Why”  and  “How”  (RCCA 
process  will  determine  this) 

*  Brainstorm  potential  causes/contributing  factors  including  use  of 
Data  Collection  above:  Classify  data  utilizing  KNOT  Chart,  may  also 
use  Affinity  Diagram,  Pareto  or  Scatter  Chart,  Histogram,  or  other  tools; 
assign  actions,  ECD’s,  etc 
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RCI  Guide  Details  -  Cont 


•  Root  Cause  Analysis  (RCA):  Almost  always  more  than  one  root 
cause,  and  the  level  of  rigor  is  determined  by  complexity  of  problem. 
Consider  “Severity/Recurrence  Risk  Cube”  analysis  for  RCA  approach 

-  RCA  Techniques  (may  use  one  or  more):  Timeline ,  5-Whys,  Apollo, 
Fishbone  (Ishikawa)  Cause  and  Effect  Diagram,  Process  Analysis  or 
Process  Classification  Method  including  Process  Mapping  or  Logic  Flow 
Pert  Chart ,  Advanced  Cause  and  Effect  Analysis  (complex  Cause/Effect 
relationships  in  Fault/Failure  Tree  format ),  RCA  “ Stacking ”  (apply  multiple 
techniques)  or  Interaction  of  methods  (i.e.,  Fishbone  diagram  followed  by 
5- Why’s  on  the  identified  RC  from  the  Fishbone) 

-  Special  Considerations  for  Space  RCA:  Low  volume,  high  value  assets 
vs.  serial  production  situations,  Pre-launch  vs.  on-orbit  investigations, 
Managing  schedule  pressures  and  team  technical  independence 

-  Unverified  Failures  (UVF ):  What  to  do  when  root  causes  cannot  be 
determined  (in  this  context  the  UVF  discussion  is  often  about  “how  far  do 
you  go  in  looking  for  root  cause?”) 


UtS*  Space  Program 
Mission  Assurance 

■  * 

Improvement 

Workshop 
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Reasons  For  Missing  True  Root  Cause 


•  Incorrect  team  composition 

•  Incorrect  data  classification 

•  Lack  of  objectivity/incorrect  problem  definition 

•  Cost  and  schedule  constraints 

•  Rush  to  judgment 

•  Lack  of  management  commitment 


Details  on  RCA  Pitfalls  Covered  in  Exec  Overview  and  Section  13 
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Team  Facilitation  Techniques 

•  Knowledge  of  group  dynamics 

-  Ability  to  “read”  the  team  (confusion,  progress,  intimidation) 

-  Ability  to  create  a  safe  environment 

-  Ability  to  deal  with  disruptions  and  intimidation 

•  Ability  to  determine  if  team  is  diverse  enough 

•  Approach  the  problem  from  both  right  brain  creative  and  left  brain 
logical  perspectives 

•  Classify  data  accurately  (KNOT) 

•  Use  RCA  tool  with  which  the  team  is  most  comfortable 

•  FOLLOW  THE  PROCESS  (deviation  introduces  risk) 
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RCA  Level  Based  on  Risk  Matrix 


High 

Level  3  RCA 

Level  4  RCA 

Highest  Risk 
Items 

Level  5  RCA 

Medium 

Level  2  RCA 

Level  3  RCA 

Level  4  RCA 

Low 

Lowest  Risk 
Items 

Level  1  RCA 

Level  2  RCA 

Level  3  RCA 

Low 

Medium 

High 

Recurrence 


<  Likelihood  of  the  Event  Recurring  > 


Failure  Risk  Matrix  used  to  Determine  RCA  Rigor  Needed 
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RCA  Level  by  Impact  Matrix 


RCA 

Level 

Impact 

Commonly  used  Data  Collection  &RCA 
Methods 

Typical  Analysis 
Span 

Output  Artifacts  (as  required) 

5 

High-High 

■  KNOT  Chart 

•  Event  Timeline 

■  Process  Mapping 

•  Cause  Mapping 

•  Fishbone  Diagram 

•  Advanced  Cause  &  Effect  Analysis 

■  Fault  Tree  Analysis 

2-6  Weeks 
(or  longer) 

■  RCA  Findings  and  Conclusions 

■  Validation  and  Measurement  Strategy 

■  Illustration  of  Root  Cause  Analysis 

■  Company  wide  communications 

4 

High-Medium 

Medium-High 

■  KNOT  Chart 

•  Event  Timelin  e 

‘  Process  Mapping 

•  Cause  Mapping 

•  Fishbone  Diagram 

‘  Advanced  Cause  &  Effect  Analysis 

4  days  -  2  Weeks 

•  RCA  Findings  and  Conclusions 

•  Validation  and  Pdeasurement  Strategy 

■  Illustration  of  Root  Cause  Analysis 

■  User  Community  communications 

3 

High- Low 
Medium-Medium 
Low-High 

*  Brainstorming 

*  Event  Timeline 

*  Cause  Mapping 

*  Fishbone  Diagram 

1-3  days 

■  RCA  Findings  and  Conclusions 

■  Validation  and  Measurement  Strategy 

•  Illustration  of  Root  Cause  Analysis 

*  Affected  people  communications 

2 

Low-Medium 

Medium-Low 

■  5-Whys 

■  Brainstorming 

’  Fishbone  Diagram 

.5  - 1  day 

■  RCA  Findings  and  Conclusions 
•  Affected  people  communications 

1 

Low- Low 

■  5-Whys 
>  Brainstorming 

1-4  hours 

■  RCA  Findings  and  Conclusions 

■  Affected  people  communications 

Impact  Matrix  Provides  Guidance  on  Applicable  RCA  Methods 
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Intended  RCI  Guide  Use 


•  Primary  use 

-  Guide  for  RCI  teams  and  sponsors  on  space  related  investigations 

-  Help  develop  effective  RCI  plan  and  depth  of  rigor 

•  Publicize  RCI  guide  at  conferences 

-  International  Society  of  Testing  and  Failure  Analysis  (ASM) 

-  International  Reliability  Physics  Symposium  (IEEE) 

-  Reliability  Availability  and  Maintainability  Symposium  (RAMS) 

-  And  others 

•  Specific  recommendations  for  industry: 

-  Incorporate  best  practices  in  corporate  command  media 

-  Use  as  a  reference  to  subcontractors  to  set  expectations  and  improve 
communications 

•  Specific  recommendations  for  government: 

-  Use  as  a  reference  to  contractors  to  set  expectations  and  improve 
communications 

-  Consider  using  as  reference  in  program  offices  and  RFP’s 
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